Coin,
Unfortunately, after an "apparently" successful upgrade this night,
SSH servers on upgraded machines were not restarted correctly or
crashed. Consequently it is no more possible to log into Orfeo,
Toushirou, or Elwing.
A rescue action is planned this evening for Elwing and Toushirou, so
they should be available again soon. We have a plan for Orfeo too, but
i cannot tell you when it will be aplicable. Nevertheless, in a few
days everything should be alright.
For tests purposes and also to switch to a newer kernel, these
machines will be restarted, so your sessions will die, sorry.
Maintaining and upgrading machines is a delicate and sometimes
difficult task. We really try to handle this process as smoothly as
possible and we are very sorry for the inconvenience.
Other services are unaffected. Do not hésitate mailing us if you need
assistance (or via IRC).
--
Marc Dequènes (Duck)
Coin,
During 2008-07-16 evening, between 20:00 and 21:00, our main server
Orfeo is to be moved
to a new space in our sponsor's achitecture. A short downtime is
planned, during which
several services will be completely unreachable (primary mail services
MX1/IMAPS/POP3S/SIEVE/Webmail/..., NS1, IRC Services, Postgres, NTP).
Other services are
replicated and should not be bothered.
Have Fun !
--
Marc Dequènes (Duck)
Coin,
=== Toushirou Reboots ===
Even if the kernel seems less bloated, regular reboot are necessary.
Be sure we only resort to this solution when really there are
unsolvable problems due to kernel oops or brokeness. A few days ago it
was necessary, then the next one should be in a month or two. This
does not really affect availability of services, as this is a matter
of minute before everything is up again, but people using screens
would be annoyed.
=== HQ Technical Problems ===
Due to an unfortunate accident in the HQ while moving in furniture,
both the firewall and backup machine were chocked. Result is the
firewall hard drive is injured, and even if the machine is still
working fine, it may probably not survive a reboot. The lack of hosted
services on this machine should not be a problem as there is no more
important external services not available outside, but you should
understand lags from my part, as i may suddenly have no internet
connection, or be busy repairing. The backup machine is not
responding, and my screen died by the way, so i'm currently blind.
According to the strange sounds coming from it, it seems the hard
drive is out of order. Then, starting from now, you shouldn't count on
us having an available backup of your data, and it might even be
impossible to recover older backups (as we are not rich enough to
backup the backup).
Orfeo and Toushirou should continue working quite fine, but please
backup your crucial data in case we are not able to repair everything
fast enough before another problem arise.
May the heat stops injuring machines, and god bless them to avoid
disasters again...
--
Marc Dequènes (Duck)
Coin,
As Usual, we are so busy, we didn't really have time to give much
news. Follows a few
unsorted news, to quickly feel the gap.
=== Security Problem ===
Due to a "BIG security issue"[1] in Debian, our software was quickly
upgraded to fix the
issue, but this is not sufficient, and keys/certificates are weak and
must be regenerated.
For the SSH host keys, they are being regenerated today, and new
fingerprints will be
advertised in DNS via SSHFP entries (there is no validation of such
entries yet, but
better than nothing). That is to say: you SHOULD verify you get the
following message
when you try to log back to our machines after removing the old keys in
'~/.ssh/known_hosts' :
Matching host key fingerprint found in DNS.
Complete session initialization would be like the following :
# ssh root(a)toushirou.duckcorp.org
The authenticity of host 'toushirou.duckcorp.org
(2001:7a8:800:6666::1)' can't be
established.
RSA key fingerprint is 77:40:c9:c1:f3:cc:17:22:67:50:8d:3d:1f:39:bd:46.
Matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)?
For the user's SSH keys, you should take care of them yourself, and
verify your ones are
not weak with the Debian provided tool. Soon they will be blacklisted
and won't work
anymore (and you know you cannot log into DC's machine with
passwords). If you could not
fix your keys and '~/.ssh/authorized_keys' in time, just contact us to
manually insert a
new key.
For the services' certificates, they are being regenerated soon too,
and as our CA
(certificate authority) is not compromized, it should be invisible to you.
=== HQ Unavailability ===
A few weeks ago, you might have expirienced unavailabilit of the HQ
ans the few public
services services hosted in there. The ADSL problem seems to be
closed, and we are trying
to improve service redondancies for the future.
=== Jabber Issues ===
As said above, we are trying to improve HQ hosted services
availability, and Jabber is
one of the most important ones. We added another jabber server to make
a cluster, which
is totaly invisible to the users. Unfortunately, it was more difficult
than expected, and
the downtime was followed by a long time with screwed up rosters. We
managed to reinject
rosters, but with a not so fresh backup. Most of our contacts were
retrieved, thought,
and it should be working fine now.
=== Backup Downtime ===
We just managed to restart our backup server which was down because of
hardware problems
during about 3 weeks. Seems fresher stuff is needed here.
=== Homes Synchronization ===
Arnau worked well coding a nice script which daily synchronize hidden
files/directories
in you home (those begining with a dot, mostly configuration files)
accross our machines.
People allowed to log into several machines won't have to manually copy their
configuration files and keys and stuff.
Have FUN... :-/
[1] http://www.us.debian.org/security/2008/dsa-1571
--
Marc Dequènes (Duck)
Coin,
=== Mail Problems ===
As we were preparing Orfeo to host the MX1 services, we decided to
upgrade the software (Dovecot part). A custom package is needed, and a
mistake was made, which should not have been a big deal if another
software (DSPAM) had not a silly bug. We have been warned quickly of
this issue, but stil a few (not many) mails have been rejected with
"unknown user" reason. We then had to block incoming mails a bit to fix
this annoying situation. Sorry for this mess :-(.
=== Orfeo NG ===
Seems i forgot to say Orfeo is back alive in a new body. Same spirit
inside, really !
Orfeo is gonna take over his prefered services, unloading Toushirou in
the meantime.
In fact, the problems with Toushirou, reason of the coming maintenance,
accelerated the need to get another machine. There is it now, allowing
for Toushirou's analysis without shuting down everything.
=== Mail Teleportation ===
Mail is gonna move to Orfeo, before Toushirou's maintenance. We have
been a bit delayed with the previous issue, but it may happen on monday
(yes it is still sunday here), or just after eating the last new year's
eve chocolates. So be prepared for a short but complete mail services
downtime.
=== Sieve ===
If you forgot what is SIEVE, then read this:
http://en.wikipedia.org/wiki/Sieve_%28mail_filtering_language%29
Information on available features can be found here:
http://wiki.dovecot.org/LDA/Sieve
(we are using Dovecot 1.0.x)
With the previous software upgrade, comes several sieve/managesieve
fixes, which should ease your life a bit. Currently managesieve was an
experimental feature, but now you may consider it production
level. Still experiments with different clients should be done, like
IMP, but already we managed to get sieve-connect (in Debian package of
the same name) work with a small patch which can be found here:
ftp://ftp.duckcorp.org/duckcorp/patches/sieve-connect_dovecot.patch
You can use it both like a one-time command, or like an interactive
shell, like this :
sieve-connect -s sieve.duckcorp.org -u <user>
This allow uploading/downloading/viewing sieve scripts in a special
per-user directory. You may then activate the script you want, which
would then be used for incoming mail instead of the current
script. Don't worry, your former script (before using managesieve
activate feature) would not be lost, but only moved in the special
directory under the name "dovecot.orig". In fact the activated script is
since then a symlink to the activated script in your special
directory. You then don't need anymore a shell access to manage your
scripts.
Once Dovecot 1.1 is out, you'll be able to use the 'include' sieve
feature, thus having multiple scripts manageable on the server would
really makes sense.
=== Cummunication Services ===
Beware bitlbe is gonna move soon to Orfeo too. So check if you are using
the proper bitlbee.duckcorp.org alias.
Bip IRC bouncer is still down, but we hope to have it up again on
Orfeo. Maybe the bip problems were related to the strange problems on
Toushirou.
=== DNS move ===
As you may have seen, our NS1 is already Orfeo. Toushirou is meant to
act as the NS2/NS3 (dual connectivity). Toushirou still needs a few
cleanups, but it is not a big deal. Nevertheless, zones modifications
are no more possible for users with a shell access at the moment. Do not
hesitate asking us modifications via mail.
Enough for today...
--
Marc Dequènes (Duck)
Hello,
We are going to stop services running on Toushirou in order to run
diagnostic tests (we experienced a few problems with Toushirou) between
30th December and 4th January. Only the following services will be still
available through Orfeo:
- IRC
- Mail
(Webmail and Mailing Lists as well).
- BitlBee
(it will move forever on Orfeo and will be available on
bitlbee.milkypond.org, so you should use this hostname, the password
is the same than the one you are using on DC)
- NS1
Sorry for the inconvenience.
Regards,
Arnaud Fontaine (on behalf of the DuckCorp team)
Coin,
As announced, the last migration happened this we and few things were
fixed today. This was quite a difficult task, and we had several extra
problems...
=== The Mail ===
The mail routing with the new antispam software was a bit difficult to
setup, so the mail was reopened sunday night. The software should have
sent you a welcome with all important information about how to manage
your custom antispam settings (or will do so when you receive your first
mail on your new box).
All previous mails were moved out of Orfeo, before it dies. Seem we are
cursed, as the CPU or MB is not working properly anymore, and its future
needs yet to be discussed. The box Finger provided to move the data had
boot problems during the migration, and died a few hours later.
The good news is : nothing is lost, everything was moved or is still
available, as Orfeo's disks are alive. You should then find all your
previous mails in your new box. The IMAPS/POP3S/Webmail/Webdesk access
are still available and working. The automatic fetchmail process is also
back since this afternoon.
For the people still using direct mailbox access, it is right time to
use maildirs for good now, as it is now the only delivery backend
available now.
The procmail rulez were translated to sieve, with only minor problems,
are are working well, except for the ${n} regex replacements. We are
working on this issue. A ManageSieve server is available using
sieve.duckcorp.org, but it seems it is quite hard to make clients
interoperate with our sieve implementation. Nevertheless, the Webdesk
was reconfigure to use the sieve siltering rules ; it is working quite
well, except for the "Show Active Script" action, which is not very
important.
The mailing-lists are working too, and are spam filtered since this
afternoon. Special users are available for each virtual domain, so as to
be able to setup custom antispam preferences, as any other mail user
would. It may be possible to give per-list custom filtering access in
the futur, but we do not believe this is a needed feature, so we won't
work in this direction until user's requests say the contrary.
=== Shells ===
Only a few shell access were reopened. The new policy is not yet
decided, but if you need back your access, we will consider reopening it
gently. Thus, by asking explicit demands, we hope to close ununsed
ressources.
=== Communication ===
Our Bitlbee server was migrated and upgraded, and is now available
worldwide on the 6668 port as usual (on Ipv6 too). Ask us if you need an
account.
The new IRC Services are coming real soon now ; Nohar kindly prepared
packages for us.
So, i would surely have forgotten plenty of things, but it would be
clearer when we get some sleep. Do not hesitate contacting us if you
need assistance.
--
Marc Dequènes (Duck)
Coin,
The last big part of the migration is coming soon. Fasten your belt and
pray god :-).
=== Mail ===
Friday evening, all your mailboxes are moving to the new server, and
during the we all mail functions will follow. This is to say: you won't
have any mail access during the we, starting from friday, in the middle
of the afternoon. Incoming mails would then be stored on secondary MX
servers. As soon as possible, mail delivery and IMAPS access are gonna
be restored ; webmails and MLs access would follow.
Then, send every important mail you need before Friday afternoon, stored
messages you need to read using the offline mode of your mail client,
and use an external email in case of emergency. Everything should be
restored at the end of the we, but we cannot plan when exactly.
For people using procmail rules, please help us convert your rules to
a sieve script.
=== Web ===
Seems Everything went fine for the remaining sites. FTP access for
modifying your site is work in progress and should be available
soon. Shell accesses will be reexamined and reopened _later_ for people
having real needs.
Webstats are disabled for now ; we plan to reopen it, but this is not a
first priority task.
=== Communication Services ===
Our new IRCd is behaving quite well. IRC Services should come soon, as a
package is nearly ready, thanks to Nohar.
The IRC Proxy, bip, is already working well since a few days, and logs
available through the private part of the FTP server.
Bitlbee is quite more of a problem to move, as a local access is
currently necessary. Either we would give you back a shell account,
either we would allow your IPs to connect to it. It should be moved
during the we or in the following week.
Our Jabber server is not moving yet, and perhaps not before Orfeo is
back for real. Whatever, at least we plan to link it with our new
database, so adding new account would be quite painless. This is gonna
happen soon, during a night. We should be able to send a server-wide
message to all users, or at least warn as many people as we can on IRC.
=== Homedirs and other data ===
For people having shell access, we are moving your homedirs in the same
shot as mails. All shell accesses will be closed at least until the end
of the we. Depending on your real needs, shell access may or may not be
reopened ; this is still an open discussion between admins on how we
should allow access to this service. Whatever, we won't leave you out
with your data locked, but just thin about copying things you might need
during the we.
Missing FTP data are moving with mails and homedirs, and will be
accessible quite quickly.
=== RCS ===
RCS data were moved, and we are working on giving you proper access to
it. Arch (tla/baz) archives are already viewable throught archzoom, but
this is all what is working at the moment. People whose shell access
will be restored would be able to use it in read/write mode. This
service may be interrupted while we need to make some tests, by making
files unreachable, so this is not yet a reliable service.
Ok, it should be enough for this time. Do not hesitate asking questions
on #DuckCorp channel on IRC. Next checkup sunday or monday, by mail if
everything went fine ;-).
--
Marc Dequènes (Duck)
Coin,
Sorry for not being talkative enough about the progress, but we were
quite busy solving severals issues, and quite tired too.
=== Centralized Accounts ===
As we are now splitting services among multiple machines (in fact Elwing
was also used to help Orfeo, but mostly for totaly separate services),
the needed for account centralization was critical. Then we won't have
to create accounts on all machines where each user needs services, and
have to synchronize information (like passwords), as everything is gonna
be spread automatically.
This task is not fully complete : all accounts where created in the LDAP
databse, but some information needs to be added for each reactivated
service. This is partly why things are taking a bit longer than
expected.
=== The Web ===
Webmail / Webdesk services were quickly put back only a few days ago, as
a priority. And slowly other sites were switched on.
As i said, many (lightweight) sites are now located on LeChat (RtpNet
machine), so a database multi-master replication was necessary and a bit
difficult to configure. Now this difficulty is over and we are switching
on the other sites one by one. Don't worry about sql hostname
modifications, either we are doing it for you, or we are contacting you
for help.
The web migration should be over in one or two days.
== Databases ==
As said previously, MySQL is available on both Tōshirō and LeChat,
meaning applications (not only websites) are easily relocatable to
balance load.
Recently a PostgreSQL database was installed on Tōshirō to provide
access to a more serious database software. So you may ask for an
account in a few days/weeks. No replication is planned yet, so
applications outside Tōshirō won't be able to access the database yet.
You may also ask for an LDAP database the same way. LDAP is being
replicated on all MilkyPond hosts involded in user services.
The phpMyAdmin tool is available again, with phpPgAdmin and
phpLDAPadmin, on the following URL :
https://db.duckcorp.org/
Beware experienced users ! The MilkyPond LDAP database is not yet ready
for user access, as we are regenerating the content frequently. So any
modification would be lost forever.
Notice the sql.duckcorp.org DNS entry, and the corresponding website,
are gonna disappear soon. More on new DNS hostnames later, they are
still under discussion.
== FTP storage ==
Both private and public data where moved from Orfeo, except for the
HurdFr public data (which will be moved in the next data move). FTP
profiles where activated, as it was on Orfeo.
=== Chat ===
A new IRCd software (with its services) are gonna be tested soon. Even
without services ready, we would probably switch is everything is ok, so
check your notices and reconnect if you find yourself alone in the
channels.
The (bip) IRC bouncer was moved to Tōshirō yesterday. Logs are available
in the FTP storage.
=== Network ===
IPv6 is back online, with broker services. Our old broker IP range is
now routed to Tōshirō, so you won't have to change your addresses and
DNS entries, only the endpoint IPv4 address (using tb.duckcorp.org).
Filtering rules have been strengthen a bit by the way.
=== Mail and homedirs ===
Another BIG move left, is moving mail to the new architecture. Fact is
we were already working on improving the routing and processing
capabilities, as well as anti-spamming methods, but we are now running
short of time to have all this work together well.
I won't talk about new features yet. But just give a word about the
major change : the anti-spam system is being switched to the DSPAM
software, which mean we would have to deal with plenty of spam until it
is trained. The good news is : it should be much easier for users to
manage and allow quite a lot of customizable features. It was also time
to split our training database, to allow "per user" spam filtering, as
we don't always agree on what is a spam or not.
As soon as it is ready, we are planning a quick move, meaning you won't
be able to access your mailbox during a short period of time. Incoming
mails won't be lost, as they would be taken care of be our secondary
MX. I just can advise you to watch your mails at least daily, or popup
on IRC, to be informed when it is happening. We will surely target late
night and/or we.
Homedirs will be affected too, as we plan to move data at the same
time. But this is not the only reason : Homedirs contains maildirs and
sometimes mail routing configurations. We plan to :
- replace procmail rules with sieve scripts
- move fetchmail rules into the private section of the FTP storage and
update the current script to use them, until a better solution is
found
If other changes/difficulties occur, we will inform you as soon as
possible.
For users having procmail rules, we would be very greatful if you could
help us convert your rules to sieve. We don't want to deal with your
personnal mailing stuff, and this could save us a lot of precious time.
Sieve is a scripting langage for mail processing, with extended features
compared to procmail capabilities. It is described here :
http://en.wikipedia.org/wiki/Sieve_%28mail_filtering_language%29
The mail software we plan to use does not yet implement all the Sieve
language ; it is able to understand the following features:
- fileinto
- reject
- envelope
- vacation
- imapflags
- notify
- regex
- subaddress
- relational
Do not hesitate to contact us if you have any problem.
It is late and i may have dropped things through my strainer-memory, so
i would complete this checkup in a futur post.
Oyasumi nasai !
--
Marc Dequènes (Duck)
Coin,
=== News ===
Tōshirō was prepared and successfully installed in its definitive
location today, thanks to Yok, Nefou, and people from Hivane and Sivit.
Moving critical services has started. Our NS and NTP have moved a few
minutes ago. While this is transparent for NTP, you may have to do some
changes in you zone settings (see the following chapter if you have such
service).
Mail and Web migrations are being prepared. Mails won't move until a few
days, because a few architecture changes and improvements will take
place, and because much testing is needed. You'll be warned when this is
gonna happen. Web pages should be more easily configured on the new box,
but because of the big amount of data, a solution needs to be found to
avoid taking years with the ADSL upload limitations. So, please be
patient.
=== NS Update ===
Tōshirō NS is available on two IPs:
- ns1.duckcorp.org (replacing Orfeo)
- ns2.duckcorp.org
It then ensure a "piece of redundancy", for network failures only.
You can ask for another NS (provided by Hivane) if you need full
redundancy.
When Orfeo is back in a datacenter, another NS would then be available.
A) If you have a master zone hosted:
A.1) for the master zone:
/!\ It is no more possible to edit your zone via shell access. A new
method would be available in the future, but this will have to wait
after the situation is all back to normal. if you need any change, then
ask us via mail or IRC.
If your registrar is Gandi and you gave us technical rights on your
zone, then everything was already made for you. Skip to B.2.
If not, then you should add ns2.duckcorp.org to your NS list for better
redundancy (in the registrar database only, the zone was already updated
by DC admins).
A.2) for the external slave zone(s):
Please update the masters which are now :
- 193.200.42.177
- 80.248.213.245
B) If you have a slave zone hosted:
B.1) for the slave zone:
There is nothing to do.
B.2) for the external master zone:
Please update the IP allowed for transfers with :
- 193.200.42.177
- 80.248.213.245
Moreover, you should add ns2.duckcorp.org to your NS list for better
redundancy (in the zone and in the registrar database).
Beware ! For those who had the unwise idea to use this kind of
configuration for their master zone:
---
@ NS ns1.mydomain.tld.
@ NS ns2.mydomain.tld.
ns1.mydomain.tld. A 1.2.3.4
ns2.mydomain.tld. A 195.5.254.194
---
First, ns2.mydomain.tld. does not exists, then you'll have to change the
server IP each time ns1.duckcorp.org moves. I agree this is a better
name for your domain, but people reading this kind of technical
information would soon understand you've got no real NS behind this
name. Moreover, either you've added the corresponding glue record, and
you are poluting important NS servers, either you didn't and you would
surely experience strange behaviors or even your whole domain be
unavailable. In either case, this is *WRONG*, correct it ! The right
configuration is:
---
@ NS ns1.mydomain.tld.
@ NS ns1.duckcorp.org.
@ NS ns2.duckcorp.org.
ns1.mydomain.tld. A 1.2.3.4
---
=== ... ===
If anything in this mail is unclear or if you need assitance for using
our services while maintenance is in progress, ask us via mail or IRC.
Stay tuned...
--
Marc Dequènes (Duck)
_______________________________________________
DC-Admins mailing list
DC-Admins(a)lists.duckcorp.org
https://lists.duckcorp.org/mailman/listinfo/dc-admins